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Abstract — In  this  paper,  we  derive  closed  form  approximations 
for  the  capacity  of  a  point-to-point,  deterministic  Gaussian 
MIMO  communication  channel.  We  focus  on  the  behavior  of 
the  inverse  eigenvalues  of  the  Gram  matrix  associated  with  the 
gain  matrix  of  the  MIMO  channel,  by  considering  small 
variance  and  large  power  assumptions.  We  revisit  the  concept 
of  deterministic  MIMO  capacity  by  pointing  out  that,  under 
transmitter  power  constraint,  the  optimal  transmit  covariance 
matrix  is  not  necessarily  diagonal.  We  discuss  the  water  filling 
algorithm  for  obtaining  the  optimal  eigenvalues  of  the 
transmitter  covariance  matrix,  and  the  water  fill  level  in 
conjunction  with  the  Karush-Kuhn-Tucker  optimality 
conditions.  We  revise  the  Telatar  conjecture  for  the  capacity  of 
a  non-ergodic  channel.  We  also  provide  deterministic  examples 
and  numerical  simulations  of  the  capacity,  which  are  discussed 
in  terms  of  our  mathematical  framework. 

Index  Terms — MIMO,  transmitter  optimization,  channel 
capacity,  Telatar  conjecture,  water  filling. 

I.  Introduction 

In  this  paper  we  reexamine  some  of  the  fundamental 
concepts  of  MIMO  channel  capacity,  focusing  on 
deterministic  MIMO  channels.  However,  we  also  consider 
probabilistic  channels,  both  of  the  ergodic  and  non-ergodic 
type.  Our  analysis  shows  that  Telatar's  conjecture  [1]  for  the 
capacity  of  a  non-ergodic  channel  needs  a  similarity 
adjustment  via  unitary  matrices.  We  present  evidence  of  this 
claim  by  extrapolating  from  the  deterministic  case. 

We  have  found  that  some  of  the  fundamental  results  in 
MIMO  capacity  have  taken  on  the  status  of  "folk  theorems" 
here,  we  gather  these  results  and  make  sure  they  are  on  firm 
mathematical  ground. 

Furthermore,  as  in  the  spirit  of  [2,  3],  we  have  given 
approximations  for  the  capacity  of  deterministic  MIMO 
channels  under  realistic  transmitter  power  constraints  and 
properties  of  the  gain  matrix.  These  approximations  are 
important  for  gleaning  information  about  how  capacity 
behaves,  without  the  necessity  of  numerical  calculations  at 
every  stage  of  the  analysis. 

Digital  Object  Identifier  10.4316/AECE.2011.03001 


To  assist  the  reader,  we  conclude  the  introduction  with  a 
subsection  on  the  notation  used  in  this  paper. 

A.  Notation 

All  vectors  and  matrices  are  complex,  unless  noted 
otherwise.  Vectors  are  denoted  by  bold  lower-case  letters  a, 
matrices  are  denoted  by  bold  upper-case  letters  A .  The 
determinant  of  A  is  det(A),  rank(A)  is  the  rank,  tr(  A)  is  the 
trace,  and  A*  denotes  the  conjugate  transpose.  We  denote  the 
(i,j)  entry  of  A  by  Ay.  For  real  x,  x+  denotes  max(x,  0),  for 
complex  z,z  denotes  the  conjugate  of  z,  and  |z|2  =  zz . 
The  n  X  n  identity  matrix  is  written  as  In .  Given  an  n  X  n 
matrix  A  ,  we  denote  the  spectrum  of  A  as  the  multiset  of 
eigenvalues  £j  (possibly  with  repeated  values)  as 
eig(A)  We  have  a  similar  multiset  eig+(A) 

consisting,  with  multiplicity,  of  the  positive  (if  any) 
eigenvalues  of  A  .  If  A  has  only  real  eigenvalues  £,,  then  we 
may  order  them  in  non -increasing  order.  We  denote  the 
multiset  of  eigenvalues  listed  in  non -increasing  order  as 
eig[ A)  A  {sy,  ...,£j;, },  where  £j  >  Spy. 

We  use  the  notation  A  to  represent  a  diagonal  matrix.  If 
the  diagonal  matrix  is  a  diagonalization1  of  the  matrix  A,  then 
we  write  Aa  Note  thatAA  is,  in  general,  not  unique  since 
there  may  be  more  than  one  diagonalization  of  A  .  If  A  is 
diagonalized  by  a  unitary2  matrix  U,  that  is  A  =  UAaU* 
then  we  have  thatAA  =  U*AU,  which  easily  tells  us  that 
eig(A)  =  eig( Aa).  Therefore,  the  diagonal  entries  of  Aa  are, 
with  multiplicity,  the  eigenvalues  of  A.  Note  that  the  spectral 
theorem  [4,  Thm.  2.5.6]  tells  us  that  any  Hermitian3  matrix  is 
diagonalized  by  a  unitary  matrix.  The  notation  A (a  a  ) 
(often  written  in  other  literature  as  d  i  ag  ( aq , . . . ,  an ) )  denotes 
the  specific  n  X  n  diagonal  matrix  with  a,  in  the  i,i  entry. 
Note  that  A (ai,..,an)  is  a  specific  matrix  of  the  form  Aa 


1  That  is  A  and  Aa  are  similar. 

2  A  square  matrix  U  is  unitary  iff  U*  =  U  1 . 

1  A  square  matrix  M  is  Hermitian  (self-adjoint)  iff  M  =  M*. 
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Finally,  if  the  diagonal  entries  of  M  are  denoted  as  M,  j, 
then  we  use  the  notation  AM. .  to  be  the  diagonal  matrix  with 
(i,  i)  entry  M , 


n.  MIMO  Channel  Model 

We  consider  a  point-to-point  Gaussian  MIMO 
communication  channel,  where  the  single  sender  employs  T 
transmitting  antennas,  and  R  antennas  are  used  by  the  sole 
receiver.  The  seminal  reference  for  the  analysis  of  MIMO 
capacity  is  Telatar  [1],  The  channel,  in  normalized4  form  (see 
[5,  Sec.II.  A]),  between  the  sender  and  the  receiver  is  given  by 

y  =  Hx  +  n  (1) 

where  x  is  the  transmitted  T  X  1  input  vector  of  the  sender, 
y  is  the  R  X  1  received  vector,  the  gain  matrix  H  is  R  X  T, 
and  n  is  the  R  X  1  additive  white  circularly  symmetric 
complex  Gaussian  noise  random  vector  with  covariance 
matrix  P[nn*\-  I/;.  (Such  a  random  vector  has  zero  mean, 
and  is  totally  described  by  its  covariance  matrix  [6,  Sec 
A.  1.3]).  We  may  also  denote  such  a  MIMO  channel  as  a  (T, 
R)  MIMO  channel. 

The  transmit  covariance  T  XT  matrix  is  defined  as  the 
expectation  matrix  Q  A  Ffxx*].  The  MIMO  communication 
system  is  assumed  to  have  a  total  power  (for  all  transmitting 
antennas)  constraint  given  by  the  non-negative  real  number 
P  such  that  tr(Q)  <  P  .  Note  that  since  tr(Q)  = 
tdUfxx*])  =  I:tr[xx*]  =  Ffx'x]  we  may  also  express  our 
power  constraint  as  E[x*x]  <  P. 

Telatar  discusses  two  types  of  MIMO  communication 
channels: 

1.  Deterministic  —  The  gain  matrix  H  is  deterministic.  In 
this  scenario  H  is  known  by  both  the  sender  and  the 
receiver. 

2.  Probabilistic  —  The  gain  matrix  H  is  random  and  its 
distribution  is  known  by  the  sender,  and  its  realization  is 
known  by  the  receiver.  Often  the  condition  of  Rayleigh 
fading  is  assumed  (which  means  that  the  magnitudes  of 
the  elements  of  the  random  matrix  H  are  independently 
Rayleigh  distributed),  but  it  does  not  have  to  be  so.  There 
are  two  possible  cases  in  this  scenario. 

a)  Ergodic5  —  The  gain  matrix  H  is  probabilistic, 
and  each  time  the  sender  transmits,  a  realization 
of  H  is  chosen  according  to  its  distribution. 

b)  Non-ergodic  —  The  gain  matrix  H  is 
probabilistic,  but  once  it  is  picked  it  never 
changes. 

Let  us  start  with  the  first  situation. 

2.1  Deterministic  Channel 

We  assume  that  the  gain  matrix  H  (the  channel  state 
information,  hereafter  CSI),  is  known  perfectly  by  the  sender 


and  receiver.  Given  Q,  such  that  tr(Q)  <  P,  the  mutual 
information6  [1,5]  between  the  sender  and  the  receiver  is 

J(Q)  a  logdet(IB  +  HQH*).  (2) 


Since  the  additive  noise  is  normalized  to  have  variance  1 , 
the  signal  to  noise  ratio  (SNR)  is  given  by  P/l  =  P. 

Mutual  information  is  well-defined  in  the  sense  that 
det(lB  +  HQH*)  >  1.  This  is  becausedetd*  +  HQH*)  = 
rii(i  +  £,),  where  £t  are  the  eigenvalues  (with  multiplicity) 
of  HQH*  .  Thus,  for  the  mutual  information  to  be 
well-defined,  it  suffices  to  show  that  the  eigenvalues  £,  >  0. 
First,  since  Q  is  a  covariance  (complex)  matrix,  it  is 
Hermitian.  Second,  a  covariance  matrix  [7]  is  positive 
semidefinite  (psd)7.  Note  that  since  Q  is  psd,  v*HQH*v  = 
(H*v)*Q(H*v)  >  0.  Then,  it  follows  that  HQH*  is  also  psd. 
Therefore,  as  required,  we  have  shown  that  the  eigenvalues  of 
HQH*  are  non-negative. 

The  very  important  determinant  identity  [8,  Cor.  18.1.2], 
[7,  1.1 3. Thm. 9]  is  often  used  in  MIMO  papers,  yet  the  proof 
is  hard  to  find.  For  the  sake  of  completeness,  since  some 
tricks  are  used,  we  sketch  the  proof  given  in  our  references 
above. 

Theorem  2.1 .  (Determinant  Identity)  If  A  is  m  X  n  and  B  is 
nXm  then 

det(lm  +  AB)  =  det(ln  +  BA).  (3) 

Proof  Since 


dm  — A\  _  /Im  +  AB  -A\/Im  0\ 
\B  In  )  \  0  InJ\B  ij 

dm  0\/Im  -A  \ 
l  B  I J  VO  In  +  BA) 

we  have  that 


One  can  easily  show  by  induction  that  partitioned  matrices  of 
,  ,  (M1  M2\  (M1  0  \ 

the  form  I  ^  M/vM  M  /  lavc  determinant  equal 

to  det(Mi)  det(M3).  Thus  the  result  trivially  follows (B[ 


Corollary  2.1.  ([7])  The  scalar  A  is  a  non-zero  eigenvalue  of 
AB  iff  it  is  a  non-zero  eigenvalue  of  BA. 

Proof.  Say  that  A  =£  0  is  an  eigenvalue  of  AB .  Then  we 
know  that  det(  Alm  —  AB  )=0.  Since  det(AIm  —  AB)  = 
det/ldm  -  T_1AB)  =  Am  det(lm  -  A_1AB)  = 

Am  det(lm  —  A_1BA)  =  Am~n  det(AIn  -  BA)).  So  A  *  0  is 
also  an  eigenvalue  for  BA,  the  rest  follows.  □ 

The  determinant  identity  allows  us  to  express  0(  Q)  as 


4The  normalization  is  done,  as  in  [1],  by  modifying  H,  so  that  the  noise  has 
unit  power. 

5This  term  is  used  in  the  sense  that  the  temporal  average  is  equivalent  to 
keeping  time  constant,  but  averaging  over  different  realizations. 


6A11  logarithms  are  base  2,  therefore  information  is  measured  in  bits. 

7We  say  that  an  n  X  n  matrix  M  is  psd  iff  v*Mv  >  0  for  all  n  vectors  v. 
Note  that  if  v  is  an  eigenvector  of  M  with  eigenvalue  A,  then  0  <  v*Mv  = 
A|v|2  so  any  eigenvalue  of  M  must  be  non-negative.  (Note  that  the  converse 
is  true  provided  M  is  also  Hermitian). 
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7(Q)  =  log  det(  ir  +  qh*h).  (4) 


The  MIMO  deterministic  capacity,  in  units  of  bits  per  second 
per  Hertz  (bps/Hz)  8  ,  is  the  maximum  of  the  mutual 
information  under  a  transmitting  power  constraint 


C  a 


max 

Q:tr(Q)<P 


3(  Q). 


(5) 

Note  that  H  H  is,  by  definition,  a  Gram  matrix.  What  is 
important,  aside  from  H  H  being  Hermitian,  that  it  is  also 
positive  semi-definite.  This  is  because  v*H*Hv  =  IHvI2  >  0. 


Some  observations  are  in  order. 


•  The  first  observation  is  that  the  maximum  is 
well-defined.  By  this  we  mean  a  supremum,  for  the 
above  constrained  subset,  of  O(-)  exists,  but  one  must 
show  that  a  maximum  is  actually  achieved  on  the  subset. 
The  covariance  matrices  Q  with  trace  less  than  or  equal 
to  P  form  the  inverse  image  of  a  closed  set,  in  the  natural 
matrix  topology  that  TXT  covariance  matrices  inherit 
from  the  topology  of  all  T  X  T  matrices.  Now,  we  use 
the  Frobenius  norm  [4]  of  Q,  ||Q||  =  QQ*)  = 
Vtr(  Q2)  which  is  bounded,  since  the  trace  of  Q,  which 
is  the  sum  of  the  eigenvalues,  is  bounded  by  P.  Thus,  we 
see  that  Q  :  tr(  Q)  <  P  is  a  compact  set,  so  a  maximum 
is  obtained.  Note,  one  may  also  make  a  direct 
Karush-Kuhn -Tucker  (KKT)  optimization  argument  as 
in  [9,  Appendix]. 

•  The  second  observation  is  that  we  can  replace  the 
maximization  constraint  Q  :  tr( Q)  <  P  in  (5)  with  Q  : 
tr( Q)  =  P.  Let  us  show  this  by  contradiction.  We  ignore 
the  logarithm,  since  it  is  an  increasing  function  and  just 
concentrate  on  det  (\T  +  HQH*).  Say  that  the  maximum 
of  the  determinant  is  obtained  for  some  Q'  :  tr{ Q'  )  =  P j  < 
P.  We  know  that  det(Ir  +  HQ'H*)  =  11.(1  +  e.)  , 
where  the  £L  are,  as  before,  the  possibly  non -distinct 
eigenvalues  (with  multiplicity)  of  HQ’H*.  Consider  the 

P 

matrix  Q"  =  —  Q\  If  u,  are  the  eigenvalues  of  Q\  then 
P 

—  are  the  eigenvalues  of  Q",  so  fr(Q")  =  P.  Note 

P\ 

that  det(Ir  +HQ'H*)  =  det(Ir  +  —  HQ'H*)  =  Fli(  1  + 

P\ 

p 

—  £j ) .  Since  there  must  be  at  least  one  £;  ^  0  (we  do 

P\ 

not  consider  cases  where  H  is  the  zero  matrix,  and  we 
know  that  Q'  is  not  the  zero  matrix),  we  see  that 

p 

rid  +  —  £;)  >  mi +  £j).  So,  by  contradiction,  we 

M 

have9  that 

C  =  maxQS4>3(Q)  (6) 


8  We  have  initially  factored  the  bandwidth,  in  Hz,  out  of  the  capacity 

equation. 

9  See  footnote  7. 


where  4>  is  the  set  of  T  xT  covariance  matrices  with 

trace  P.  □ 

Since  the  CSI  is  known,  both  the  sender  and  receiver 
know etg(H*H)  ={pi, ...,  pT},  and <?/g+(H*H)=  {pb  ...,  pic}, 
where  <;  <  min{T,/?}.  Note  that  we  could  also  use  the 
identical  (by  Cor.  2.1)  multiset  eig+(HH*). 

Since  log  is  an  increasing  function,  maximizing  the 
mutual  information  J( Q)  can  be  done  by  maximizing  detflj- 
+  QH*H).  We  call  any  Q  that  maximizes  3( Q)  optimal  and 
use  the  notation  Qop  for  an  optimal  Q,  since  the  eigenvalues 
of  Q  are  denoted  as  q„  we  denote  the  eigenvalues10  of  a  Qop 
as  q°p .  Let  eig( Q)  =  { qT],  we  are  interested  in  this 
eigenvalue  spectrum  when  Q  is  optimal. 

In  terms  of  historical  precedence,  Telatar  [1]  discusses  the 
idea  of  converting  a  point-to-point  MIMO  channel  into 
orthogonal  parallel,  noninterfering  SISO  channels11.  Also, 
[10]  discusses  "water  filling"  on  the  inverse  eigenvalues  of 
eig+(  H*H). 

We  can  ignore  log(-)  in  our  discussion  and  simply 
concentrate  on  det(Ij-  +  QH*H).  If  Q  commuted  with  H*H, 
then  det(Ir  +  QH*H)  could  be  trivially  expressed  [4, 
Thm.2.5.5],  [11,  Thm.3.1].  However,  a  priori  we  have  no 
reason  to  assume  such  commutativity.  Telatar  [1,  Sec.  3.2] 
cleverly  applies  the  determinant  identity  (3)  twice  and  shows 
that  det(Ir  +  QH*H)  <  det(Ir  +  A0A„»„)  = 

nli(l+AQiiAH.Hi  .  ) .  However,  there  is  a  slight  gap  in 
Telatar's  exposition.  He  has  to  show  that  the  maximum  is 
obtained  for  some  Q'  that  is  a  covariance  matrix  with  trace  P, 
and  then  show  that  detllj-  +  Q'H*H)  =  det(Ir  +Aq AHtH>  We 
will  bridge  that  gap. 

Since  H*H  is  Hermitian,  by  the  spectral  theorem  [4, 
Thm.  2.5.6],  there  exists  a  unitary  matrix  U,  such  that 

H*H  =  U*Ah.hU. 

Consider  U*QU,  this  matrix  is  Hermitian,  and,  because 
U*  =  U"1,  it  has  the  same  spectrum  as  Q.  Therefore,  U*QU  6 
ct>.  Note  that  U*QU  G  <t>  has  real  non-negative  diagonal 
values  and  its  trace  is  P.  Therefore,  A (o*QO)i,i  e  ®  ■ 

Let  Q'  =  0*AqU,  then  Q'  G  <P  also12. 

Consider  det  (Ir  +  Q'H*H)  =  det  (1T  +  Q'0*AH*H0)  = 
det(Ir  +  U*AqUU*Ah*hU)  =  det(Ir+  0*AQ*AH*HU)  and,  by 
the  determinant  identity 
det(IT  +  00  *  AQAHtH)  =  det  (IT+  AQAHtH). 

Thus,  we  have  shown  that  det  (IT  +  AqAh*h)  is  not  only  an 
upper  limit,  but  it  is  actually  an  achievable  value  for  some 
Q'  G  0.  Therefore  it  suffices  to  maximize  det  ( IT  + 
AqAh„h)  for  Q  G  0. 

Trivially,  we  have  that 

det(Ir  +  AQAHtH)  =  U ,( 1  +  qLpt)  (7) 


l0A  priori  there  may  be  many  such  multisets  of  optimal  eigenvalues. 

11  SISO  stands  for  Single  Input  Single  Output  —  the  classical  Shannon-type 
channel. 

12  A  priori  there  is  no  reason  why  Q  should  be  diagonal;  this  is  contrast  to 
Telatar’s  confusing  statement  that  the  maximizing  Q  is  diagonal.  Actually, 
his  statement  applies  to  Hadamard’s  inequality — which,  in  fact,  is 
maximized  by  some  diagonal  Q  -not  to  the  maximization  of  the  mutual 
information. 
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Since  the  p,  are  fixed,  we  must  determine  the  corresponding 
q i  that  maximize  (7).  We  denote  these  maximizing 
eigenvalues  as  q°p  .  Note  that 

maxdet(Ir  +  A0AH.H) 

Qeo  v 

is  well-defined  since  we  may  let  Q  be  A| ^op  q°v)  ^  ^>- 

The  optimal  eigenvalues  are  obtained  by  water  filling  and 
applying  the  Karush-Kuhn-Tucker  (KKT)  [12,  13]  optimality 
conditions: 


(8) 

where  co  is  the  water  fill  level.  If  it  were  not  for  the  (-)+ 
operation,  finding  the  water  level  co  would  be  trivial  we 
would  just  sum  the  non-zero  q°p ,  set  the  sum  to  P  and  solve 
for  co .  We  have  the  multisets  (in  non-increasing)  order 
‘eig  (H*H)  and  eig+  (H*H),  which  gives  us  the  unique 
multiset  (in  non -increasing  order)  eig+  (Qop).  This  gives  us 
the  unique  multiset  of  {q  fp ,  -  •  • ,  q~p  }  ,  where 

(co  — —  for  1  <  i  <  C 

q°P  =  (9) 

l  0  for  <;  <  i  <  T. 

Note  that  the  q°p  correspond  to  the  power  allocated  on  the 
orthogonal  parallel  channels  discussed  above.  This  is  the 
approach  shown  in  [1,  Sec.  3.2]  where  it  is  stated  that  when 
we  have  perfect  CSI  (knowledge  of  a  fixed  H)13,  then 

C  =  log  det(Ir+  (U*A(?oP  ?°P)U)H*H).  (10) 

Thus, 

T  T 

c  =  log J  J(i  +  Pi)  =  i°g(i  +  q°ipPi)-u 

i=l  t=l 

(ii) 

One  must  perform  an  iterative  algorithm  to  find  first  the 
q°p ,  and  then  obtain  o>  by  summing  the  non-zero  q°p  and 
setting  that  to  P.  It  is  an  issue  of  how  much  power  P  is 
available. 

The  water  filling  analogy  of  the  solution  algorithm  is  as 
follows:  We  have  a  water  tank  with  infinitely  high  sides  that 
is  T  units  across  and  one  unit  deep.  We  place  T  bricks  in  the 

bottom  of  the  tank.  The  ith  brick  is  1  unit  long  and  deep  and 
1 

—  units  high,  and  we  consider  them  in  ascending  order  of 

i 

1 

height  —  from  brick  1  to  brick  T.  The  key  to  determining  co 

i 

is  to  keep  in  mind  that  lr(Q'v)  =  q°p  =  P.  The  procedure  is 
as  follows: 

1.  The  heights  of  the  bricks  are  non-decreasing  as  we  go 
from  left  to  right,  i.e.,  the  first  brick  that  we  consider 


'  Keep  in  mind  that  need  not  equal  q'fi .  and  similarly  for  the  q°v  and 


similarly  for  the  ft  and  ft. 

14  Which  is  also  equal  to  £f=i  log(l  +  q“pft). 


1 


is  the  shortest  one.  There  is  always  enough  water 
(i.e.,  power  P)  to  cover  the  first  brick,  and  therefore 
q~p  =  u>  — —  >0.  We  initially  set  co  =  P  +  - 
1 

The  height  of  —  determines  if  there  is  residual 

2 

1 

power  to  cover  the  second  brick;  that  is  if  P  +  —  < 
1 

—  we  cannot  cover  the  second  brick  and  we  stop 
2 

here,  else: 

2.  We  have  enough  power  to  cover  at  least  the  first  two 
bricks  and  we  now  also  have  that 

q ^_p  =  co  —  —  >  0 .  We  know  the  eigenvalues  of 
2  P2 

Qop  and  adjust  the  water  fill  level  such  that  a>  — 
1  11 

—  ( P  + - h  — ) .  We  now  determine  if  we  have 

2  /ft- 

enough  power  to  cover  the  third  brick,  that  is,  if 

2  111  l  \ 

P  < - -  —  +  —  we  are  done,  else: 

Ms  2  ^  ft_  J 

3.  We  keep  iterating  this  process  until  all  the  bricks  are 
covered  or  we  run  out  of  power.  The  index  of  the  last 
brick  that  is  covered  is  <;  <  T.  This  gives  us  a  water 
fill  level  of 


P 

0)  - - h 


<  2-i 


(12) 


What  is  interesting  about  the  above  equation  is  that  the  water 
fill  level  is  given  by  the  total  power  P  normalized  by  q  added 
to  the  average  heights  of  the  bricks  that  are  covered. 
Furthermore,  this  tells  us  that  the  non-zero  eigenvalues  of 
Q""  are 


op 

qj  = 


(M 

6y± 

for  1  <  i  <  <; 
for  q  <  i  <T. 


(13) 


Since  co  >  (<)—  for  i  <  (>)<;,  we  have: 


-  P1 

t  U  1 


>  1  for  1  <  t  <  <; 
for  <;  <  i  <T. 


(14) 


From  (11)  we  obtain  Telatar’s  result  [1,  Sec.  3.2]: 
loq  { 1  +  l  co  -  ^ 


C  =  ^log(l  +  fm — —  j /r<_ j  +  ^  log(  1  +  0) 

/  i=?+ 1 

(15) 

T 

=  ^log(cu/iJ  +  ^  0 


i=c+i 


=  ^  (log  +  =  ^(log(cogi)) 

i=l  i= 1 


(16) 


(17) 
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Notice  that  if  the  T,  j  <  c;  are  clustered  about  their  mean 

i 

1/| a  we  can  approximate 


1 

? 


I 


1 

ft' 


(18) 


We  call  this  the  Small  Variance  Assumption  (SVA);  using  it 
in  (12)  and  (13),  we  obtain  the  SVA  value  of  the  water-fdl 
level: 


SVA:  a)  =  -  +  - 


' P-  for  1  <  i  <  ( 
.0  for  C  <  i  <  T. 


(19) 


2.1.1  SVA  Capacity  Approximation 


i=  1 

(23) 

Note  that  under  the  SVA  or  SSVA  the  capacity 
approximation  involves  an,  at  worst,  suboptimal  choice  of 
eigenvalues  for  Q,  therefore  CSVA,CSS VA  <  C. 

Two  points  must  be  stressed.  First,  the  fact  that  the  V u . 

are  clustered  around  their  mean  ^  does  not  guarantee  that 
the  Pi  are  close  to  their  mean  pt  and  vice  versa.  Second,  it  is 
worth  nothing  that  the  deterministic  MIMO  capacity 
naturally  involves  the  inverse  of  the  eigenvalues  of  H*H,  not 
the  eigenvalues  per  se. 


If  the  SVA  is  assumed  then  we  approximate  the  capacity  as 
follows 

i 

Csva  ~  ^  i°g  +  7^v) 

i= 1  s 

(20) 

where  (  is,  as  before,  the  index  of  the  last  p^.  to  be  "covered 

i 

with  water."  This  lets  us  also  express  CSVA  as 

CSVA  =  logdetl  Ir  +  U*| A 1  * . *  10  H*H  1 

\  ones  and  7-^  zeros/  / 

(21) 

where  as  before  H*H  =  U*Ah*hU15.  We  express  the  diagonal 
matrix  as  (  ones,  followed  by  T  -  (  zeros  because  we  do  not 
have  control  over  the  ordering  of  how  the  eigenvalues  of 

H*H  are  expressed  in  the  diagonal  form  U*Ah*hU. 

1 

If  all  of  the  —  are  approximately  equal,  then  the  index  { 

i 

is  set  equal  to  T  because  there  is  enough  power  to  flow  over 
1  .1 

all  of  the  — ,  since  they  are  all  at  the  "same"  height  -. 

/** _  fi 

i 

Furthermore  we  can  drop  the  index  on  the  optimal 
eigenvalues  of  Qop.  Thus  we  obtain  the  Strong  Small 
Variance  Assumption  (SSVA;  we  use  the  word  strong  since  it 
involves  the  maximal  value  (  =  T) 


SSVA:  a) 


^  for  1  <  i  <  ( 
0  for  C  <  t  <  T. 


(22) 


2.2  Ergodic  Channel 

Recall  that  in  this  case  H  is  probabilistic  (it  is  usually 
assumed  that  H  represents  Rayleigh  fading),  and  every  time 
the  channel  transmits,  a  new  realization  of  H  is  drawn.  In  this 
situation  expected  values  of  mutual  information  and  capacity 
are  used.  Of  course,  one  must  be  cautious  with  such  terms 
because  Shannon’s  [14]  coding  results  were  not  originally 
given  for  such  concepts,  and  new  thoughts  in  coding  and 
throughput  must  be  considered.  Following,  the  discussion  as 
in  [1],  we  define  the  ergodic  capacity  £,  also  in  units  of 
bps/Hz  as  an  expected  value: 


£  A  £[logdet(lr  +  (P/r)H*H)].  (24) 


Note  that  multiplying  the  matrix  H*H  on  the  left  by  the  scalar 
P/T  is  equivalent,  to  the  matrix  multiplication 

V/r . %)H*H' 

T  terms 

We  have  that  (24)  can  be  expressed  as: 


T 

£  =  £Uo5]~](l+^)]  =  £ 

(25) 

where  the  pL  are  the  random  eigenvalues,  with  multiplicity, 
of  H*H.  Thus,  in  the  ergodic  case  the  optimal  Q  is  also  of  the 
form 

A(P/r . p,t)  -  ( P/T)h ■  (26) 


2.1.2  SSVA  Capacity  Approximation 

If  SSVA  (which  is  a  special  case  of  the  SVA)  is  assumed, 
then  we  approximate  the  capacity  as  follows 

Cssva  =  log  det  (lr  +  ^H*h)  =  ^ |og  (i  + 

i= 1 


15  Note  that  if  T  =  £  the  diagonalizing  matrices  U.  U*  cancel  out  in  (21) 
because  the  diagonal  matrix  has  all  ones  down  the  diagonal  and  hence 
commutes  with  all  matrices  of  the  proper  dimensions. 


2.3  Non-ergodic  Channel 
2.3.1  Telatar's  conjecture 

As  above,  we  consider  a  probabilistic  H.  However,  in  the 
non-ergodic  case,  once  H  has  been  chosen  it  is  constant. 
Attempting  to  maximize  mutual  information  will  fail, 
because  there  is  a  non-zero  probability  that  the  chosen  H  will 
not  support  a  given  capacity  value.  Nevertheless,  if  we 
incorporate  the  concept  of  outage  probabilities,  then  one  can 
attempt  to  find  a  Q  that  optimizes  the  throughput.  The  details 
for  this  are  in  [1,  Sec.  5.1].  We  now  have  the  famous  Telatar 
conjecture  [1,  Sec.  5.1]: 
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Conjecture  2.1.  (Telatar)  The  optimal  Q  is  of  the  form 

P 

—  yl( 10  X).  The  value  of  k  is  inversely  related  to  the 
outage  probability. 

2.3.2  Adjustment  of  the  Telatar  Conjecture 

Equation  (21)  mimics  the  Telatar  conjecture  for  the 
non-ergodic  channel.  Note  that  (21)  calls  into  question 
Telatar's  choice  of  the  optimal  Q  in  his  conjecture.  We 
present  a  modified  version  of  the  conjecture  below. 

Conjecture  2.2.  For  a  non-ergodic  channel,  the  optimal  Q  is 

P 

of  the  form  U*  -A/  \U.  The  value  of  k  is 

k  (* . .■ . *) 

ones  and  T-k  zeros  ' 

inversely  related  to  the  outage  probability  and  U  is  unitary. 


with 


c=Z<'°g“'“l=Z|og((?+fZ^V‘)= 

=  g[(logft)  +  log^  +  ll£ 

•  T*  1  •  • 

Since  P  »  2jUi~  we  may  approximate  the  capacity  as 

Mi 


C«£[<logft)  +  logQ]  =  £[log(^ 


2.4  Discussion 

We  see  in  all  three  situations  that  the  optimal  Q,  under  either 
the  SVA  for  the  deterministic  case  or,  in  general,  for  the  other 

P 

two  cases,  is  of  the  form  A rp  p\  or  -  A/  \. 

It . t)  k  I  l . i,o . o 

v  iC'rAi' 

It  remains  to  be  seen  how  good  the  SVA  approximation 
is,  i.e.,  we  wish  to  evaluate  how  far  off  from  the  actual 
capacity,  the  SVA  capacity  is.  This  approach  was  taken  for 
binary  input  discrete  memoryless  channels  in  [15,  16,  2,  3]. 
We  turn  our  attention  to  this  issue  in  the  next  section. 


m.  Quality  of  the  Approximation  of  Deterministic 
Capacity 

3.1  SSVA  revisited 

We  will  assume  that  we  are  in  the  deterministic  case  and 
analyze  the  SSVA  a  bit  further.  We  assume  that: 

1.  There  are  T  transmitting  antennas, 

2.  Both  the  sender  and  the  receiver  know  H. 

3.  The  inverse  eigenvalues  ^/pi  of  H*H  are  all 
approximately  equal,  and 

4.  The  value  of  the  total  transmission  power  P  =  tr( Q). 
From  our  previous  results  (23)  we  know  that  we  can 
approximate  the  capacity  as 

T 

Cssva  =  ^  log  ^1  +  — 

i=l 


3.2  The  Case  of  Large  Power  P 

Now  let  us  examine  the  situation  where  the  total  power  P  = 
tUQ)  satisfies  the  inequality 


P  » 


I1- 

4-iPi 


This  assures  us  that  there  is  enough  water  to  cover  all  of  the 
1 

inverse  eigenvalues  — .  We  find  that  the  water  fill  level  is 

Mi 

T 

P  1  v  i 

T  T  Pi 


and  that 


op 


i 

k  =  1 


1 

Pi 


Thus  we  have  the  Large  Power  Assumption  (LPA),  which  has 
an  approximate  capacity  of 


C~CLPA  =  T  log  Q  +  ^logMi 

i= 1 

T 

=  T’  log  (-)  +  log  J  J 

i= 1 

(27) 

Furthermore  16  ,  if  the  pi  are  large  enough  we  can 
approximate  log  as  log  (^1  +  ~~j,  which  gives  us  the 

Large  Power  and  Moderate  Eigenvalues  Assumption 
(FPMEA),  which  has  a  capacity  approximation  similar  to  the 
SSVA 

T 

ZP 

logd  +  -mi) 

£= 1 

(28) 

Thus,  whenever  the  conditions  for  the  SSVA  are  met  (that  is, 
all  the  eigenvalues  of  H*H  are  approximately  the  same),  or 
the  conditions  for  the  FPMEA  are  met  (that  is,  P  is  much 
greater  than  the  sum  of  the  inverse  eigenvalues  of  H*H  and 
\/i,Ppi  »  T),  the  same  form  of  the  approximation  can  be 
used: 

CK=  ^|0g(l  +  ^i) 

1  =  1 

(29) 

Therefore,  for  the  remainder  of  the  paper  we  use  the  notation 
CH ,  when  the  conditions  of  either  SSVA  or  FPMEA  are 
assumed  to  have  been  met. 

3.3  Deterministic  Examples 

We  use  a  (2,2)  MIMO  channel.  Keep  in  mind  that  >  /r«_. 

1  2 

We  assume  that  we  have  enough  power  to  cover  with  water 

both  and  that  is  P  >  ^/p^_  ~  so 

12  1  2  1 


16  For  large x  we  have  that  limJ._,00(log(  1  +  x)  —  log(x))  =  0. 
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C  —  —2  +  log ( 1  +  (P  H - )p^)  +  log(l  +  (P  4 - )p^) 

1  m<-  2 

2  1 

=  “2  +  log  (l  +  (p  +  ±)  ih)  +  log  (l  +  (p  +  £)  m2). 

In  our  example,  the  water  filling  conditions  are  satisfied 
for  P  =  10,  and  2  <  pltp2  ^  7.  Therefore,  we  can  view 
(Figure  1)  the  capacity  as  a  function  of  {(/U-l,/^)},  with  a 
natural  symmetry.  Or,  we  can  view  it  (Figure  2)  as  a  function 

of  the  "fundamental  domain"  of  ^  under 

the  "action"  that  swaps  p1  with  p2 

If  the  inverse  eigenvalue  variance  of  H  H  is  small  then  we 
have 

Cx  =  log  (l  +  +  log  (l  + 

=  log(i  +  ^mi)(i  +  ^m2) 

=  log(l  +  P^  +  P2^). 

This  is  illustrated  in  Figure  3  with  P  =10  and  the  pL  £  [2,7]. 

As  we  see,  and  not  surprisingly,  the  analysis  of 
approximations  is  directly  related  to  the  amount  of  power  and 
the  perturbations  in  inverse  eigenvalues  of  H  H. 

If  all  T  inverse  eigenvalues  are  equal,  any  non -zero  P  will 
suffice  to  give  us  a  water  fill  level  of  P  +  ^~/p  In  this  case,  C 
and  Cssva  are  identical.  This  is  well -illustrated  in  Figure  4. 
As  the  differences  in  inverse  eigenvalues  grows,  so  does  the 
error.  However,  one  must  keep  in  mind  that  the  difference  in 
the  inverse  eigenvalues  is  inversely  related  to  the  difference 
in  the  actual  eigenvalues.  Therefore,  if  the  eigenvalues  are 
large,  changing  them  does  not  have  much  of  an  effect  upon 
the  validity  of  the  SSVA  approximation.  However,  if  the 
eigenvalues  are  small,  then  a  slight  change  in  them  can  result 
in  a  large  error  using  the  SSVA  approximation,  unless  the 
power  suitably  grows. 

In  Figure  4  we  illustrate  how  slight  the  approximation 
error  is  if  P  =  10  and  the  eigenvalues  of  H*H  are  constrained 
to  the  interval  [2,  7].  Note  that  in  this  scenario,  the  maximum 
difference  between  inverse  eigenvalues  is  1/2  -  l/7«.36.  If 
we  keep  the  eigenvalues  in  the  range  in  question,  and  force 
ourselves  to  have  enough  power  to  cover  all  the  inverse 
eigenvalues  with  water,  then  this  is  about  as  bad  as  the 
approximation  will  get  —  which  is  not  very  bad  at  all. 
However,  if  we  consider  very  small  eigenvalues,  then  the 
situation  changes. 

We  let  the  eigenvalues  be  in  [.0001,  .0100].  We  need  a 
minimal  power  greater  than  900,  to  cover  both  inverse 
eigenvalues.  If  we  choose  P  =  1000,  we  see  that  we  have  a 
very  large  error  (Figures  5  and  6).  However,  if  we  up  the 
power  to  P  =  10,000  we  see  in  Figures  7  and  8  that  we 
substantially  reduce  the  error,  and  we  can  continue  this 
process.  In  fact  if  P  =  100,000  the  error  is  0(1O"3). 

In  conclusion,  we  see  that  if  the  inverse  eigenvalues  are 
close  to  their  mean  value,  we  can  approximate  the  capacity, 
and  the  same  is  true  if  we  have  large  power. 


Capacity  MIMO  2,2  with  P=10 


Figure  1.  C  =  -2  +  log  (l  +  (p  +  ^-)  ft)  +  log  (l  +  (p  +  ^-)  ft) 
Capacity  MIMO  2,2  with  P=10 


Capacity  MIMO  2,2  with  P=10 


Figure  3.  SSVA:  C„  ,  P  =10. 
DIFFERENCE  MIMO  2.2  with  P=10 


Figure  4.  Difference:  0(1 0'3),  C  -  CK. 
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Capacity  MIMO  2,2  with  P=1000 


Figure  5.  Capacity,  P=  1000. 
DIFFERENCE  MIMO  2,2  with  P=1000 


Figure  6.  C  -  CK  ,  P  =  1000. 
Capacity  MIMO  2,2  with  P=  10000 


Figure  7.  Capacity,  P-  10,000. 
DIFFERENCE  MIMO  2,2  with  P=  10000 


Figure  8.  C-QP  =10,000. 
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TV.  Conclusion 

We  have  examined  the  capacity  formula  for  deterministic 
MIMO  channels.  We  have  put  the  analysis  on  a  firm 
theoretical  foundation  and  we  have  developed  simple,  closed 
form  approximations  for  the  capacity  of  the  form 

Jjog(l +  £„,). 

1=1 

We  have  discussed  the  Telatar  conjecture,  and  have 
restructured  it.  Note  that  an  overall  theme  of  this  paper  has 
been  to  show  how  capacity,  in  both  the  deterministic  and 
probabilistic  cases,  is  the  natural  study  of  the  behavior  of 
inverse  eigenvalues. 
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